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ABSTRACT 

Environmental statistics provide a necessary means of comparing the properties of galaxies in different en- 
vironments and a vital test of models of galaxy formation within the prevailing, hierarchical cosmological 
model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a par- 
ticular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same 
two-point correlation functions do not necessarily have the same companion count distributions. We use this 
statistic to examine the environments of galaxies in the Sloan Digital Sky Survey, Data Release 4. We also 
make preliminary comparisons to four models for the spatial distributions of galaxies, based on A^-body simu- 
lations, and data from SDSS DR4 to study the utility of the counts-in-cylinders statistic. There is a very large 
scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo 
occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that 
prevalent, empirical models of galaxy clustering that match observed two- and three-point clustering statistics 
well fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3 and 6-h _1 Mpc 
scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 
6-h _1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 
ff'Mpc cylinder than the galaxies in any of the models we use. Simple, phenomenological models that map 
galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments. 
Subject headings: cosmology: theory, large-scale structure of universe — galaxies: formation, evolution, inter- 
actions, statistics 



1. INTRODUCTION 

Measurements of galaxy environments provide a crucial 
test of large-scale structure and of the physics of galaxy 
formation. Long used as a tes t of cosmological mod- 
els (e.g.. jBlumenthal et alJ 11984): iBrvan & Normanl [l998|: 
Bullock et all I2002t iBerlind et alj 120051: iBerrier et alj \200d, 
Blanton & Berlind 2007), environmental statistics become 
more powerful probes of galaxy formation models as the 
cosmological parameters of our Universe are measured with 
higher accuracy. 

In the modern view of galaxy formation, galaxies form 
within dark matter halos. At any given epoch the relationship 
between galaxies and their dark matter halos can be described 
by a "halo occupation distribution" (HOD), which specifies 
the probability that a halo of mass M hosts N galaxies with a 
given criteria. This relationship is still quite difficult to pre- 



dict from first principles, and thus it is useful to use measure- 
ments of various environmental statistics to empirically con- 
strain this distribution and to inform more physical models. 
The two-point correlation function has long been among the 
most powerful tools to characterize large-scale structure (e.g. , 
Peeble s! [19731: iKirshner et alj [19791: iDavis & Peebles! [19831: 
de Lappar entet alJ 1 1 988t iNor berg et all 120011 : IZehavi et alJ 
2005t Padmana bhan et all l2007h . and has frequently been 
used t o constrain the HOD f or a given galaxy popu- 
lation (Scoccimarro et al. 2001; Berlind & Weinberg 2002; 



Abaz aiian et alj 120051 : iLee et alj I2006t iZheng & Weinberg! 
120071) . 

A related way to probe the distribution of galaxy envi- 
ronments is the close-pair fraction, which has been used 
widely in observational surveys to characterize the evolu- 
tion of galaxy merger rates, enabling tests of the hierar- 
chical merger sequence predicted by the standard cosmo- 
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logical model dZepf & Kool Il989t lYee & Ellingsoii 



Patton et all 119971 120021: iLin et . al .1 12004 iDe P ropris et al. 
2005L I2007f) . iBerrier et alj d2006l) examined the evolution of 



the close-pair fraction of dark matter halos in an A^-body sim- 
ulation. They find that the close-pair fraction of dark matter 
halos does not directly measure the merger rate of galaxies. 
However, the predicted halo close-pair counts do match the 
observed close-pair fraction of galaxies, assuming that every 
sufficiently large dark matter halo contains a galaxy. While 
these results are encouraging, tests in other regimes are nec- 
essary to assess both the underlying cosmological model and 
the manner in which galaxies are related to overdensities of 
dark matter. 

The local number density of galaxies has a well- 
established connection to the morphologies and colors 
of individual galaxies; this relat ionship is known as the 
morphology-density relation dOemlerlll974t iDresslerl 119801 : 
iPostman & Gelled Il984t iPark et all 120071). There are sev- 
eral methods of measuring density. IDresslerl dl980l) uses 
the 10 nearest neighbors to calculate the local surface den- 
sity. More r ecently, many groups use counts within spheres 
dHogg et alJ 120031: iBlanton et al] l2003aL |2005al) or cylin- 
ders (Hogg et al. 2004; Blant onet alj |2006; Kauffma nn et alJ 
120041: iBarton et al.ll2007l) of fixed radius. As with group find- 
ing algorithms, this method suffers from both incomplete- 
ness and contamination. However, it is more straightforward 
and well-defined to implement number counts than a group- 
finding algorithm. 

In this paper, we further explore the utility of counts-in- 
cylinders statistics as a diagnostic of galaxy formation mod- 
els. We consider both semi-analytic galaxy formation mod- 
els (based on publicly-available catalogs from the Millen- 
nium simulation) as well as methods based on halo abun- 
dance matching. The latter type of model uses high res- 
olution dissipationless, cold dark matter simulations, com- 
bined with simple prescriptions for the galaxy-halo connec- 
tion. Such models have proved remarkably successful in 
matching several statistics of the galaxy distributio n, includ 
ing two-point and three- point correlatio n func tions jCa rlberi 
1991b IColin et alJll997t iKravtsov et al][2 004: N eyrinck et al 
20041 Iconroy et al]l2006t iMarfn et alJl2008l) 



The paper is organized as follows. §|2]provides an overview 
of our analysis of counts-in-cylinders in the Sloan Digital Sky 
Survey data. In § [3] we discuss the A^-body simulations and 
the phenomenological models used in our analysis. We com- 
pare the observational and simulation results as described in 
§ H] We give our primary results in § First, we demon- 
strate the complementarity between the two-point correlation 
function and counts-in-cylinders statistics. We then use the 
counts-in-cylinders statistic to probe galaxy environments in 
several models for the galaxy distribution. Second, we show 
that galaxies in the SDSS sample are considerably more iso- 
lated than galaxies in any of four mock galaxy catalogs that 
we consider. 

In § |6l we discuss our results and explore potential sys- 
tematic issues stemming from either our simulation analysis 
or our treatment of the SDSS sample. We draw conclusions 
from our analysis in § [7] 

2. OBSERVATIONAL DATA: THE SLOAN DIGITAL SKY SURVEY 

We u se the Sloan Digital Sky Surve y (SDSS), Data Re- 
lease 4 ( Adelman-McCar thv et alj |2006). Specifically, we use 
the Large Scale Structure subset of the NYU - Value Added 
Galaxy Catalog (NYU-VAGC), compiled by IBlanton et al] 



(2005b). The combined spectroscopic sample in the NYU- 
VAGC covers an area of 2627 square degrees, to an apparent 
magnitude limit of r = 17 .77. Here we use the NYU-VAGC to 
create a volume-limited catalog of 27959 objects, limited to 
M r < -19 + 51og/z, with redshift limits 0.0044 > z < 0.0618. 

Of course, the SDSS is an incomplete redshift survey. Fiber 
collisions cause an estimated incomp leteness of ~ 6%, al l 
from pairs of galaxies closer than 55" (Blan ton et alj |2003c). 
An additional ~ 1% of galaxies are missed because of bright 
foreground stars. As described below, we use the random cat- 
alogs provided on the NYU-VAGC website, which have the 
same geometry as the survey, to estimate the fraction of com- 
panions missed because of incompleteness, which we apply 
as a correction to the cylinder counts data. We also use the 
random catalogs to determine where the cylinder used for our 
companion counts analysis falls off the edge of the survey. 

3. SIMULATIONS 

We compare the SDSS data against several models for the 
galaxy distribution, based on A^-body simulations. We use 
two different simulations, and for each simulation we exam- 
ine two distinct methods for matching galaxies to dark matter 
host halos and subhalos; thus, we examine four distinct model 
galaxy catalogs, which we will refer to by their brief names 
(Z05 Via, Z05 Vnow, Millennium, and MPAGalaxies) as a con- 
venient shorthand. Each of these models is physically moti- 
vated, though some are more strongly favored. We use them 
to test different physical models and to account for the cosmic 
variance we expect to find in an SDSS-sized sample. The sim- 
ulations, models, and redshift surveys used in this paper are 
summarized in Table Q] Please note that the term "halos" is 
used throughout this work to mean "all halos" (i.e., both host 
and subhalos). 

3.1. N-body Simulation and Substructure 

The primary simulation to which we compare the data 
is a high resolution A f -body simulation, previousl y de- 
scribed in lAllgood et ail d2006l) . IWechsler etail d2006l) . and 
IBerrier et al.1 d2006 ). The simulations were performed us- 
ing an Adaptive R efinement Tree (ART) A^-body code 
dKravtsov et al.ll 19971) with a cosmology of f2 m = 0.3, h = 0.7, 
and erg = 0.9. The simulation consists of 512 3 particles in a 
comoving box of 120 h~ l Mpc on a side, with a particle mass 
of m p ~ 1.07 x 10 9 h _1 M Q . This simulation is used to iden- 
tify host halos, i.e., halos whose centers do not lie within the 
virial radius of larger halo. The host halos are complete down 
to virial masses ofM~ 10 10 YC 1 Mq. 

Substructure is included usin g the semi-analytic tech- 
nique described in IZentner et al.l d2005l) . As described in 
Berrieretal](|2006), we ignore substructure from the host ha- 
los, and replace it with substructure using the semi-analytic 
formalism. The number of subhalos that merge with a given 
host halo is determined using the exte nded Press-Schechter 
formalism ( Somerville & Kolatt 1999). Subsequently, we 
model the evolution of merged subhalos including both mass 
loss processes and dynamical friction. We track the evolution 
of the subhalos until their maximum circular velocities drop 
below V max = 80 km s . Adding this semi-analytic component 
to the simulation removes inherent resolution limits, eliminat- 
ing the problem of "over-merging," which is of particular im- 
portance for the enumeration of close pairs and companion 
counts we perform. 

We use the maximum circular velocity of the halo, y max , 
as a proxy for luminosity in what follows, where V max = 



Galaxy Environments 



3 



TABLE 1 

Summary of Data and Simulations 



Name 


Type 


Size((h l Mpc) 3 ) 


Description 


Reason Included 


SDSS DR4 


Redshift Survey 


3.034 X 10 6 ; 

4783 square degrees 


Observational data, 
incompleteness ^6% 


Data 


Z05 V now 


DM N-body+Semi-analytic Substructure 


(120) 3 


Model using currenl subhalo Vmax as 
proxy for mass to assign luminosities 


Reduced resolution issues 


Z05 V in 


DM N-body+Semi-aiialylic Substructure 


<120) 3 


Model using accreted subhalo V max as 
proxy for mass lo assign luminosities 


Reduced resolution issues 


Millennium 


DM N-body Simulation 


(5Q0) 3 


Use mass to assign luminosities 


Large simulation used lo calculate 
cosmic variance and check for 
systematic errors in V'now 


MPAGalaxies 


Millennium + Semi-analytic 
Galaxy Modeling 


(500) 3 


Use luminosities generated by model 


Used lo explore how more complicated 
galaxy assignment affects distribution 



max [y/GM« r)/r}. This measure is less ambiguous than 
any particular mass definition. Moreover, subhalos lose mass 
at their outskirts rapidly upon accretion, while halo interiors 
(and thus V max ) are less severely affected by this mass loss, so 
this proxy is robust to mild mass loss that likely would not af- 
fect the galaxy that resides at the subhalo center. In this work, 
we use two distinct models to compare the simulation (which 
we will refer to as "Z05") with observational data. The first 
model uses the V max that the subhalo has at the epoch of inter- 
est (e.g., after it has been accreted by its host halo, evolved, 
and potentially lost significant mass). We wil l refer to this 
model as V nnw . Based o n previous work (e.g., iConrov et alj 
l2006tlBerrier et ail20 06). this model is not favored. However, 
we note the results for the sake of completeness, and for the 
sake of comparison to the Millennium simulation, which also 
reports the V max at the epoch of interest (as described in § 13.31 
The second model, the V m model, assumes that luminosity 
is related to the V max that the subhalo had just as it was be- 
ing accreted and prior to dynamical evolution. The V m model 
has been shown to reprod uce the galaxy two-p oint correlation 
function at many epochs (Conrov et al. 2006), as well as the 
galaxy close-pair fraction (Berrier et al. 2006) very well. For 
subhalos V now is smaller than V; n , because V- m characterizes 
subhalos prior to the removal of mass by the interaction with 
the host halo. As a result, the V now model has fewer subhalos 
above any fixed maximum velocity threshold. The average 
number of subhalos for a host of a given mass for each of 
these models is shown in Figure Q] These m odels are also 
described in more detail in B errier et al.l ( 120061) . 

3.2. Identifying Galaxies with Halos 

To best compare the simulations to the data from SDSS, we 
treat the simulation output as if it were a redshift survey, by 
assigning luminosities to the dark matter halos and restricting 
the information used to that which would be obtainable from 
actual observations. 

Since the N-body substructure simulations contain no 
model for luminous matter, we us e the published r-ban d lu- 
minosity function from the SDSS (Blan ton et al.ll2003bl) . and 
assign luminosities to the subhalos to match the observe d 
galaxy numb e r dens ities. Following iRravtsov et al.1 (120041) . 
IConrov et al.l ((2006) and iBerrier et al.l (12006b . we assume a 
one-to-one relation between dark matter halos (including sub- 
halos within larger host halos) and galaxies. Larger halos cor- 
respond to brighter galaxies. We establish this relation as fol- 
lows: for any r-band magnitude, we integrate the luminos- 
ity function to compute the cumulative number density of ob- 
served galaxies brighter than this magnitude. We match this 
to a halo V max by assigning this magnitude to the V max value 



for which the cumulative number density of all halos is the 
same as the cumulative number density of observed galaxies. 

For the Z05 simula tion, we use both t he V; n and V now mod- 
els. As discussed in Berr ier et al.l ({2006 ). assigning luminosi- 
ties based on the V now model assumes that baryons may be 
stripped significantly from the galaxy as it evolves in the po- 
tential of the larger host, thus gradually reducing the galaxy 
luminosity. This model underproduces satellite galaxies in the 
simulation. Alternatively, using V; n to calculate the number 
density assumes that the baryons are more tightly bound than 
the dark matter, and that the luminous galaxy does not lose 
significant stellar mass after accretion onto the larger host. 
The true evolution is likely between these extremes. The V m 
model has significant observational support, but we consider 
both models here for the purposes of comparison. 

After the luminosities are assigned, we use all halos and 
subhalos with M r < -19+5 log h (which corresponds to V max = 
137 km s~ l for the V; n model and V max = 127 km s~ l for V n0 w)- 
"Moving" the simulation out to a distance of 500 h _1 Mpc, 
to cover the appropriate redshift space, we convert the x, y, 
z coordinates to RA, Dec, and redshift. We then compute 
the distance on the sky in exactly the same way as we do for 
the redshift survey. We employ periodic boundary conditions 
with the simulations to ensure that the cylinder we are using 
never falls off the edge. 

3.3. Millennium Simulation 

The Millenniu m simulation was performed with the 
GADGET-2 code (Springel 2005) and follows the evolution of 
2160 3 part icles of mass 8 i 6x 1 s h _1 M in a box 500 h _1 Mpc 
on a side {Springel et al. 2005). The cosmology is compara- 
ble to that of the Z05 simulation. In particular, the Millen- 
nium simulation cosmology is spatially flat with Q m = 0.25, 
h = 0.73, and erg = 0.9. Halo identification was performed 
at run-time during the numerical simulation and the resultant 
halo catalogs, which we use in the present study, are publicly 
available. The stated completeness limit of these catalogs is 
M > 1 .7 x 10 1() /i _1 Mq. For our purposes, the primary advan- 
tage of the Millennium simulation, and the halo and galaxy 
catalogs produced from it, is the large volume compared with 
the Z05 models. The Millennium simulation volume is ~ 42 
times larger than the volume-limited SDSS sample we con- 
sider. 

The Millennium Database does not provide V max for all sub- 
halos, so we use halo mass as a proxy for galaxy lu mino sity in 
this case. We assign luminosities as described in § 13.21 where 
we rank simulated halos by their mass, observed galaxies by 
their r-band luminosities, and map halo mass onto luminosity 
by matching the cumulative number densities at the mass and 
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luminosity thresholds. For field halos, this is much like the 
Vnow model described above, aside from variations in V max at 
fixed mass due to variations in the internal structures of halos. 
For subhalos, the relation between V max and mass may be sig- 
nificantly altered and exhibit larger scatter due to the interac- 
tions between the satellite objects and the host potentials. We 
assign luminosities by the abundance matching method, rank 
ordering both halos and galaxies, so we expect that in broad 
terms this assignment should be similar to the V now model in 
so much as there is little difference between rank ordering by 
mass or V max . We expect these assignments to be significantly 
different in the central regions of host halos, where the influ- 
ence of interactions on subhalos is large. 

For further comparison w e also use the MP A Galax - 
ies database desc r ibed in iDe Lucia & Blaizoil (120071) . 
iDe Lucia & Blaizotl (120071) use a semi-analytic technique 
to model the galaxies associated with the dark matte r halos 
of the Millennium Simulation. IDe Lucia & Blaizotl (120071) 
model the stellar compo nents of these galaxies using the 
iBruzual &"C harlot (2003) stellar population synthesis model, 
with the Chabrie T (120031) IMF and Padova 1994 evolutionary 
tracks. Galaxy mergers follow the mergers of their host dark 
matter halos until the halos fall below the resolution limit of 
the simulation. At that point, De Lucia and Blaizot calculate 
the survival time of the galaxies using their orbits and the 
dynamical friction time. Mergers result in a "collisional 
starburst" modeled by the prescription in Som erville et al.l 
(12001 . 

3.4. The Two-Point Correlation Function 

As stated in § [TJ for several, simple and well-motivated 
assignments of galaxy luminosity to dark matter halos, pre- 
dicted two-point correlation functions match observations 
well, particularly on large scales (> Mpc). Figure _] shows 
the two-point functions of the simulated galaxy catalogs we 
consider compared to the SPSS measurem ent of the galaxy 
two-point function by IZehavi et al.l (12005b . In broad terms, 
the Vi n model corresponds most closely to the SDSS data over 
the enti re range of scales, a s expected from the previous re- 
sults of IConroy et alJ (120061) . The V now and Millennium DM 
models deviate at relatively small separations. 

The discrepancy seen at small separations between the Mil- 
lennium dark matter halos and the other mock galaxy sam- 
ples may have several causes. Most likely, this reflects the 
fact that the relation between halo V max and remaining bound 
mass is significantly altered in dense environments relative to 
the field. It is also possible that the simulation could suffer 
from numerical dissolution of small subhalos in dense envi- 
ronments. The MPAGalaxies sample is consistent with the 
SDSS result above r = .3 h _1 Mpc. 

4. THE COUNTS-IN-CYLINDERS ENVIRONMENT STATISTIC 

4.1. General Description 

The counts-in- cylinders statistic that w e us e is similar to 
the on e used by Kauffm ann et al.l (120041) and iBlanton et al.1 
(2006). We look at every halo or galaxy in the catalog and 
count how many companions it has within a cylinder with a 
radius defined by a given transverse separation and a depth 
given by a specified line-of-sight velocity difference. We 
choose a cylinder depth that is large enough to include all 
physically-associated galaxies, except in the most massive 
clusters. Specifically, we calculate the counts-in-cylinders for 
each galaxy using four different radii — R c = 0.5 fr'Mpc, 1 



h 'Mpc, 3 h 'Mpc, and 6 h 'Mpc — searching for compan- 
ions with velocity differences of AV I < 1000 km s" 1 . 

4.1.1. Correcting for Incompleteness in the SDSS Data 

Using the same technique, we measure the counts-in- 
cylinders statistic for galaxies in our volume-limited SDSS 
NYU-VAGC sample. We search each galaxy for all com- 
panions within R c and AV. The tools of the NYU-VAGC 
allow us to estimate the completeness in this cylinder. Specif- 
ically, we search four of the random catalogs of galaxies 
evenly distributed throughout the survey footprint in the ap- 
parent magnitude range and separation on the sky. We weight 
each of the random galaxies by the completeness of the sec- 
tor from the "lss_geometry.dr4.fits FGOTMAIN" parameter 
and by estimating what is missing from the limiting magni- 
tude of the sector and the luminosity function of the survey 
(Blanton et al. 2005b). After applying this weight, we add the 
number of weighted random galaxies and normalize by the 
area searched on the sky. We then use the random counts as a 
measure of the local completeness of the survey, normalizing 
it to the mode for the survey and dividing by the completeness 
to arrive at an estimate of the corrected counts for a particu- 
lar galaxy. When constructing histograms of the counts-in- 
cylinders statistics, we weight each galaxy by the inverse of 
its corresponding completeness to account for missing central 
(searched) galaxies in the survey. 

Our correction for incompleteness undercounts the effects 
of missed close pairs, which are significantly more likely than 
"random" to be coincident in redshift space. We test the ef- 
fects of these pairs by constructing an artificial simulation in 
which we eliminate 80% of the close pairs that would appear 
closer than 55 arcseconds on the sky, as assigned by a redshift 
distribution that matches the data. The effect is not systematic 
except at low companion counts, but even there the magnitude 
of the effect averaged over <= 5 companions is < 5%, 10%, 
and 12% for the 1, 3, and 6 h"'Mpc scales, respectively. 

4.2. Sample Variance 

By subdividing the Millennium simulation volume, we es- 
timate the expected sample variance among volumes of the 
size of the SDSS volume-limited sample or the Z05 simula- 
tion box to ensure that any discrepancy we see is larger than 
can be attributed to natural variations in large scale structure. 
We calculate the cylinder counts in 64 sub-volumes from the 
Millennium simulation. Each sub-volume was cubic with a 
side length of 125 h~'Mpc and had a volume comparable to 
ourZ05 catalogs. The histograms in Figure[3]show the cylin- 
der counts for four of these volumes for the R c = 3 h"'Mpc 
cylinder (bottom), and the variation from the total Millennium 
distribution (top). The errors are 68% from the scatter within 
a bin. We do a similar calculation for SDSS-sized volumes 
within Millennium. Those results determine the errors on the 
figures which include SDSS data. 

The smooth blue line overlaid in Figure [3] shows the cylin- 
der counts result from th e Z0 5 V nov/ model. We use V now be- 
cause, as mentioned in § 13.31 this is the model to which the 
Millennium simulation results should be most closely related. 
The V now result falls for the most part within the error bars 
of the Millennium distribution, with a x 2 /(degrees of free- 
dom [dof]) value of 0.609. The slight disagreement at the 
peak may be the result of a non-trivial relation between mass 
and V now in dense regions, an inherent shortcoming in the Z05 
model, or numerical overmerging and halo incompleteness in 
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dense environments in the Millennium simulation. The first 
of these options is expected on physical grounds and seems 
most likely. The results for the R c = 1 and 6 h _1 Mpc cylin- 
ders show similar trends. 

4.3. Companions as a Function of Host Halo Mass and Halo 
Occupation 

To explore the utility of the counts-in-cylinders statistic, we 
consider it as a potential proxy for the masses of the dark mat- 
ter host halos in which the galaxies reside. Various studies 
use counts-in-cylinders to test the halo model and other en- 
vironment predictors (e.g., Blanton et al. 2006). We use the 
Vi„ model to calculate the average number of companions per 
galaxy within a host halo mass bin. Figure @] shows the re- 
sult for R c = 0.5, 1, 3 and 6 fr'Mpc cylinders. The error 
bars are one standard deviation from the distribution within 
the bin. The counts-in-cylinders statistic tracks mass and halo 
occupation, but the relationships are extremely noisy. The R c 
= 0.5 fr'Mpc cylinder can potentially distinguish between ~ 
10 12 and - 10 14 M halos. The R c = 1 Ir'Mpc cylinder dis- 
tinguishes 10 12 M Q from 10 13 M Q and 10 14 M . The R c = 
3 and 6 fr'Mpc cylinders only distinguish between halos at 
the extremes of the mass distribution. None of the cylinders 
works well for masses below 10' 2,5 M©, although this scale 
will vary with the magnitude limit of the sample. For masses 
this small, the cylinders frequently include multiple small, 
physically-distinct groups and clusters, rather than solely the 
subhalos within one large host. For large companion num- 
bers, the smaller cylinders are ineffective because they often 
do not encompass the entire group, which may have a virial 
radius of ~ lh"'Mpc. Ideally, one would tune the cylinder 
radius and depth to the halo sample of interest. 

The trend s in the 1 and 6 fr 'Mpc scales may relate to 
the result of lBlanton etal.1 (12006b . Blanton et al. show that 
galaxy color depends much more strongly on the 1 fr'Mpc 
density, as measured by counts-in-cylinders, than it does on 
the 6 fr'Mpc surrounding density. 

In Figure [5] we explore the relationship between the aver- 
age number of companions in a given cylinder and the actual 
number of subhalos residing within the host (the halo occu- 
pation). The solid lines in the four panels represent the case 
where the number of companions is equal to the halo occu- 
pation. We see that using a cylinder of R c = l.Ofr'Mpc gives 
us a very good estimate of the halo occupation for host halos 
with fewer than ^40 subhalos, while a cylinder of R c = 3.0 
fr'Mpc is reasonable for host halos with ^65-85 subhalos. 

4.4. The Complementarity of Cylinder Counts and N -point 
Statistics 

The cylinder count statistic is related to the correlation 
function. However, there are important differences. The two- 
point correlation function describes the probability of finding 
companions within a spherical shell of a given radius from a 
galaxy, and the three-point correlation function does the same 
for three galaxies at fixed separations from each other, and 
so on for the other Appoint correlators. The cylinder-counts 
distribution gives the probability of finding a certain number 
of companions within a cylinder of a set radius from a given 
galaxy. While the average number of companions at the scale 
R c is set by the integral of the two-point correlation function 
over the volume of the cylinder, the distribution of compan- 
ion counts is not specified by the two-point function. For ex- 
ample, the variance of the companion counts depends upon 



the three-point function, the skewness of companion counts 
depends upon the four-point function, and so on. In princi- 
ple, the distribution of companion counts is sensitive to all of 
the TV-point correlators and can be an efficient way to access 
information not available through two- and three-point statis- 
tics (or through the mean of the companion counts) without 
undertaking the challenging task of computing higher-point 
correlation functions. 

To illustrate the utility of cylinder-count statistics as a com- 
plement to the correlation function, we identify two distri- 
butions of galaxies that effectively yield the same two-point 
function, but have systematically different companion number 
distributions. As an example, we create a simple, toy galaxy 
distribution by modifying the Z05 V m catalog. We rearrange 
the substructure so that host halos that contain at least one 
subhalo with 27-37 companions are stripped of all of their 
subhalo companions within a 3 fr'Mpc cylinder (leaving the 
total mass unchanged). These subhalos are then reassigned to 
a location within 1 h"'Mpc of a host halo with >37 compan- 
ions. While this exercise has no explicit physical justification, 
it does create a test catalog where halos preferentially avoid 
environments with roughly ~ 30-40 companions. 

The left-hand panel of Figure [6] shows the two-point cor- 
relation function of our toy catalog compared with the best 
fit line from SDSS referenced earlier and the two-point cor- 
relation function from the Z05 V m model. The test catalog 
correlation function falls well within the error bars of the V m 
correlation function. 

In contrast, the right-hand panel of Figure|6]is the histogram 
of the number of companions within an R c = 3 fr'Mpc cylin- 
der for the test catalog, the V m model and SDSS. The figure 
shows that the effects of the substructure reassignment. There 
is a noticeable dip in the test catalog data right around the 
mean, although the mean itself does not change significantly. 
Thus, the cylinder counts statistic yields different information 
about the distribution of substructure on the scale of the cylin- 
der. 

Notably, this tool gives more direct information about the 
distribution of substructure rather than the physical separa- 
tion between objects. It complements the 2-point correlation 
function as a tool for determining the accuracy of clustering 
in simulations when compared to redshift surveys. 

5. RESULTS 
5.1. Comparing the Models to the SDSS 

We now use the cylinder counts distribution described 
above to compare the predictions from the simulation catalogs 
to the SDSS. We calculate 1-er uncertainties from the cosmic 
variance between SDSS-sized volumes within the Millennium 
Simulation to set the error bars for the SDSS data, and use 
the 1-er uncertainties from the cosmic variance between Z05- 
sized volumes as the error bars for the Z05 V m and V now distri- 
butions. We summarize our results in tables|2]and [3] 

Figure [7] shows the counts-in-cylinders distribution for the 
R c = 1 fr'Mpc cylinder on the left, and the cumulative frac- 
tion of galaxies or halos with fewer than the given number of 
companions on the right. The arrows denote the mean number 
of companions for each model. 

We can use the \ 2 statistic computed as a summation over 
the companion counts obtained in each model to assess the 
ability of each model to match the SDSS data. All values 
quoted are x 2 /(dof), where the degrees of freedom (dof) equal 
the number of bins in the distribution. For the purposes of the 
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X 2 calculation, all bins have a width of one companion (note 
that this does not correspond to the binning in Figures [7] [8] 
and |9). We ignore the correlation between different counts in 
computing x 2 and treat this statistic only as guidance. We find 
that the Z05 V now model produces a x 2 /(dof) v alue of 1 .002. 
This is surprising, as previous studies (such as IConroy et alJ 
(2006)) find that V; n tends to be the type of model that best 
matches two- and three-point clustering statistics. In this case, 
the Z05 V[ n model has a x 2 /(dof) value of 131.264, while the 
Millennium and MPAGalaxies distributions have x 2 /(dof) = 
1201.758 and 21.721, respectively. If we disregard the tail of 
the distributions, and look only at the bins with more than 
but fewer than 10 companions, we find that the Z05 V m 
value drops to 1.819, while Vn OW is 0.297, implying that the 
Z05 models are more consistent with the SDSS distribution at 
the peak. The Millennium and MPAGalaxies models are not 
significantly improved by such excisions and have a x 2 /(dof) 
value > 1 for all cylinder radii and subsamples. The cumu- 
lative fraction of galaxies with less than or equal to a given 
number of companions tells a similar story. 

For the sake of clarity, we do not include the Millennium 
and MPAGalaxies samples in the remaining figures. We do, 
however, quote their x 2 /(dof) values and companion fractions 
in Tables |2]and [3] 

The 3 fr'Mpc cylinder produces a more problematic distri- 
bution, as seen in Figure [8] While the general shapes of the 
Z05 distributions look similar to the SDSS result, note that the 
first bin is is very low for both the models. The x 2 /(dof) val- 
ues for Vi n and V llow in this case are 6.008 and 0.683, respec- 
tively. However, if we look only at the peak (< 50 compan- 
ions), x 2 /(dof) for V in = 0.494 and x 2 /(dof) for V„ ow = 7.379, 
favoring the V; n model. On the right hand side of the figure the 
discrepancy in the first bin is plainly demonstrated. While 1 .7 
% of the galaxies in the SDSS data have 1 or companions, 
the simulations have a frequency of less than half that. 

It is interesting to note that the means of the SDSS and sim- 
ulation distributions appear to be consistent with each other 
(the greatest difference from the average of the means is 7.12, 
while the standard deviation of the means from the Z05-sized 
volumes is 7.24), while the counts-in-cylinders statistics are 
not. The agreement of the means may be expected from 
the agreement in the correlation function. This result is an- 
other illustration of the complementarity of the information 
contained in the full distribution of companion counts com- 
pared to either the correlation function or the mean compan- 
ion count. 

The results for the R c = 6 fr'Mpc cylinder are shown in 
Figure [9] The underprediction by the simulations of galaxies 
with few companions is more pronounced at 6 fr'Mpc, ex- 
tending to galaxies with up to 20 companions. The x 2 /(dof) 
values in this case for the whole distribution are 1.047 for V m 
and 0.514 for V„ ow . 

Disregarding the first bin brings the Z05 values to x 2 /(dof) 
= 0.944 for Vt, and 0.401 for V now . Looking at just the peak (20 
< Number of Companions < 1 10), we get 0.337 and 1.050 for 
Vin and y now , respectively. On the other hand, if we look only 
at galaxies with fewer than 20 companions, we find x 2 /(dof) = 
3.292 and 3.800 for Vi n and Vnow Again, the Z05 distributions 
are more consistent with the SDSS results at the peak than at 
low densities. 



6. DISCUSSION 



The comparison shows a mismatch between the galaxy as- 
signment in the simulations and the SDSS data, when we look 
at the entire distribution. Here we test the robustness of the re- 
sult to simple changes in the galaxy assignment s chem e, and 
also i nves tigate possible observational effects. § 16.11 § 16.21 
and § !6.3l detail our results. 

6.1. Color Modeling 

One possible reason that the simulations do not match the 
SDSS data at large (R c = 6 fr'Mpc) scales is that we do 
not include any of the effects of varying mass-to-light ratios 
(M/L) in the models to which we assign luminosities (the 
Z05 models and the Millennium dark matter halos). It has 
been demonstrated that galaxy color is related to the envi- 
ronment, with galaxies in denser environments having redder 
colors and higher M/L ( B alo glTetaill2004t iHo g g et al]|2004t 



i Tanaka et al.lT2 004: Weinmann et al. 2006; Poggianti et al. 
l2006t iMartmez et al.ll2006t iGerke et al.ll2007l) . Here, we use 
the relationships from Weinm ann et al.1 (120061) . who relate the 
fraction of early, late, and intermediate galax ies to host dark 
matter halo mass, and IB ell & de Jo ng (2001), who derive a 
color dependent M/L, where redder galaxies are dimmer for 
a given mass. To determine the effects of M/L on our statis- 
tic, we use the med ian color as a fun c tion o f parent halo mass 
from Figure 1 1 of WeinmannetaLl <|2006) to assign a g - r 
color to each subhalo. These colors were then use d with the 
color-d ependent M/L for SDSS bandpasses from Be ll et all 
(2003) to calculate an adjusted luminosity or an effective stel- 
lar mass. We then found a weighted number density from 
these adjusted luminosities, which was used to determine the 
actual luminosity from the SDSS luminosity function. 

The color corrected Z05 V m results for the 6 fr'Mpc cylin- 
der are shown in the left panel of Figure [10] This analysis 
uses the largest possible contrast between late and early types, 
therefore causing the largest shift. While the shift in the dis- 
tribution is in the correct direction (more galaxies with fewer 
companions), it only accounts for a small part of the differ- 
ence between the simulation and observational data. The lack 
of simple color modeling is not the cause of this discrepancy. 
The mismatch of the MPAGalaxies sample, which does in- 
clude color modeling, further supports this result. 

6.2. Varying the V max Cut 

To further explore possible explanations for the mismatch 
in the distributions at large scales, we try two other models 
with the Z05 data. First, we vary the Vj n model halos being 
detected near the Vmax cutoff by decreasing the V max cutoff by 
a random number between and 25 km s"' for host halos, 
and increasing the V max cutoff by a random number between 
and 100 km s~' for subhalos before calculating the number 
density. This is a step away from the monotonic luminosity 
assignment that we have been using, and, by allowing more 
small host halos, might increase the fraction of isolated galax- 
ies. The resulting distribution, shown in the right panel of Fig- 
ure[l0] falls in between the original V m and V n ow distributions. 
While it certainly changes the shape of the distribution, the 
X 2 /(dof) value (3.292 for the full distribution) does not signif- 
icantly improve on the original results, or increase the fraction 
of halos with < 20 companions. 

The right panel of Figure[l0]also shows the distribution for 
a second model. Here we use the Via velocities for subhalos 
in hosts with V max < 10 13 M Q and V now for subhalos in hosts 
with V max >= 10 13 M Q . This procedure assumes that galaxies 



Galaxy Environments 



7 



TABLE 2 
Results of the \ 2 analysis 



R ftr'Mnr") 


705 V 


705 V 

*-AJ~> V now 


lVTillpnniiim D\T 

IV J. 1X1 Will 11 LL111 A_^J.V± 


MPAfrnlaxies 


Rc = 1 










Full 


X 2 /(dof) = 131.264 


X 2 /(dof) = 1.002 


X 2 /(dof) = 1201.758 


X 2 /(dof) = 21.721 


< bin < 10 


X 2 /(dof)= 1.819 


X 2 /(dof) = 0.297 


X 2 /(dof) = 4.629 


X 2 /(dof) = 4.149 


Rc = 3 










Full 


X 2 /(dof) = 6.008 


X 2 /(dof) = 0.683 


X 2 /(dof) = 5.232 


X 2 /(dof) = 23.441 


< bin < 50 


X 2 /(dof) = 0.494 


X 2 /(dof) = 7.379 


X 2 /(dof) = 3.011 


X 2 /(dof) = 2.604 


R c = 6 










Full 


X 2 /(dof) = 1.198 


X 2 /(dof) = 0.514 


X 2 /(dof) = 3.675 


X 2 /(dof) = 4.349 


20 < bin < 1 10 


X 2 /(dof) = 0.337 


X 2 /(dof) = 1.050 


X 2 /(dof) = 1.769 


X 2 /(dof) = 1.558 



TABLE 3 

Fraction of galaxies with fewer than given number of companions 



Number of Companions 


SDSS 


Z05 V m 


Z05 V nm 


Millennium DM 


MPAGalaxies 


R c = 1 h _1 Mpc 












<= 1 


0.352 ± 016 0.008 


0.308 ± 006 0.004 


0.360 ± 006 „.oo4 


0.391 


0.327 


<=3 


0.582 ± 018 o.oio 


0.497 ± 021 o.ooo 


0.595 ± 021 o.0O9 


0.664 


0.530 


<= 5 


0.715 ± 019 con 


0.607 ± 034 ooif, 


0.726 ± 034 0.016 


0.807 


0.641 


<= 10 j 


0.861 ± 021 0.016 


0.744 ± 11050 0.026 


0.870 ± 050 0.026 


0.941 


0.777 


R c =3h-'Mpc 












<= 1 


0.017 ± 002 0.001 


0.006 ± 003 0.001 


0.005 ± 003 0.001 


0.005 


0.008 


<=3 


0.070 ± 006 0.005 


0.036 ± 012 0.004 


0.035 ± 012 O .oo4 


0.035 


0.045 


<= 10 


0.310 ± 0020 0.0,2 


0.239 ± 025 0.013 


0.258 ± 025 0.013 


0.273 


0.254 


<= 50 


0.900 ± 024 o.oi 8 


0.786 ± 032 0.021 


0.887 ± 032 0.021 


0.943 


0.776 


R c =6h _1 Mpc 1 












<=2 


0.002 ± 0004 0.0003 


0.0003 ± 0002 0.0002 


0.0001 ± 0002 O .ooo2 


0.00007 


0.0003 


<= 5 


0.009 ± ° 000 ' o.ooog 


0.002 ± 001 0.0006 


0.001 ± 001 0.0005 


0.001 


0.004 


<= 50 


0.485 ± 014 0.013 


0.378 ± 023 0.013 


0.416 ± 023 0.013 


0.440 


0.369 


<= 1 10 


0.872 ± 018 0.016 


0.719 ± 027 0.016 


0.819 ± 027 o.on 


0.890 


0.706 



in large halos are more likely to be stripped of luminous mat- 
ter than galaxies in small halos. This distribution results in a 
X 2 /(dof) = 2.800. Once again, this does not correct the lack of 
isolated halos in the simulations. 

6.3. Isolated Galaxies 

The consistent feature in the cylinder counts distribution for 
all the of the models is the relative paucity of isolated galax- 
ies compared to the observed frequency of isolated galaxies. 
To verify that this is not an artifact of the survey footprint, 
we examine a possible cause in our analysis of the SDSS data 
for the R c = 6 h _1 Mpc cylinder counts, where the effect is 
most pronounced. First, we check the images of the twenty- 
one galaxies in SDSS nagged as having no companions. We 
find that ten are within 6 Ir'Mpc of an edge. While we do 
correct for edge effects in our analysis, we also recompute the 
fraction of isolated galaxies without those ten as a conserva- 
tive estimate. Adopting the conservative estimate of eleven 
isolated galaxies in the SDSS distribution, we expect to find 
eighteen isolated halos in the V in sample. We only find seven. 
Assuming Poisson statistics, the probability that we should 
find so few isolated galaxies in the V m sample is « 2 x 10~ 3 
(this compares to ~ 10~ 9 if all twenty one isolated galaxies 
are included). The probabilities for the other models are all 
lower by at least one order of magnitude. We conclude that 



the difference in isolated galaxy fraction is not caused by un- 
accounted for edge effects. 

7. CONCLUSIONS 

We have explored the counts-in-cylinders statistic, and used 
it to compare galaxy environments in the SDSS with environ- 
ments measured in several models for the galaxy distribution 
based on dark matter simulations. We show that this statis- 
tic provides different information than the two-point function 
alone; it is possible for two catalogs to have similar two-point 
correlation functions, but companion distributions with very 
different shapes. 

Our primary results are as follows. 

1 . There is a large scatter in the number of companions a 
dark matter halo of a given host mass or halo occupa- 
tion has within a set cylinder. The counts-in-cylinders 
statistic is limited as a tool for determining the host halo 
mass of a galaxy. 

2. We considered several models for assigning galaxies 
to the dark matter distribution, including models based 
on abundance matching to dark matter substructures as 
well a semi-analytic model from the Millennium simu- 
lation. Each of these models significantly underpredicts 
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the number of galaxies with very few companions on 
R c = 3 and 6 h"'Mpc scales. 

3. While none of the simulations or models examined have 
a counts-in-cylinders distribution that is consistent with 
that of SDSS data, the two abundance matching mod- 
els (Z05 V now and V; n ) have similar distributions when 
the first few, very discrepant, bins (corresponding to the 
most isolated galaxies) are ignored. 

4. The counts-in-cylinders test fails for models that match 
the two- and three-point correlation functions, high- 
lighting its utility as a diagnostic. 

We have tested the robustness of these results to a series of 
possible systematic errors. Simple changes to the color as- 
signment or to the scatter model in the abundance matching 
approach do not change the conclusions. We have accounted 
for known effects in the completeness of the SDSS NYU- 
VAGC data. In addition, is its hard to see how any small scale 
incompleteness would explain the discrepancy seen on several 
Mpc scales. 

Our results indicate that some observed galaxies in the 
real universe are significantly more isolated than any halos 
of comparable size. It does not appear that the discrepancy 
we have identified in the counts-in-cylinders can be easily re- 
solved with a standard halo occupation approach that assumes 
that all of a galaxy's properties are set by the mass of its host 
halo. In any case, this mismatch merits further study. 
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r(h _1 Mpc) 

FIG. 2. — Two-point correlation functions for models and data. The straight line is the power law fit to SDSS from Zehavi et al. (2005), £(r) = (r/rof 1 , with 
t\) = 5.59 ± 0.11 h~*Mpc and 7 = 1.84 ± 0.01, while the black stars represent the actual data (I. Zehavi, private communication). The magenta triangles are the 
correlation function from the Vj„ model, the blue stars are from Vn 0w . the solid red squares are the Millennium simulation dark matter halos, and the open cyan 
squares are the Millennium simulation galaxies. The error bars are calculated by jackknifing over octants of the simulation for the Z05 models and over similarly 
sized subvolumes of the millenium sample. 
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Number of Companions 

FIG. 3. — Comparison of counts-in-cylinders between the Millennium simulation and the Z05 model (smooth blue line) for the 3 fr'Mpc cylinder. Bottom 
panel: Distribution of counts in R c =3 h~'Mpc cylinders. The arrows denote the mean of each distribution. Top panel: Fractional deviation from the mean 
cylinder counts. The shaded area denotes the dispersion among the 64 sub-volumes of the Millennium simulation and provides a guideline for the statistical 
limitations of the comparison. 
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Host Mass (h- 1 M ) 

FIG. 4. — Clockwise from top left: Average number of companions in the Z05 V m sample within R c = .5, 1, 6, and 3 fr'Mpc cylinders for galaxies within a 
given host halo mass bin. Shaded regions indicate the 1-a scatter in the value for each bin. Note that the y-axes have different scales. 
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Halo Occupation 

FIG. 5 . — Clockwise from top left: Average number of companions in the Z05 V in sample within R c =.5,1,6, and 3 hr' Mpc cylinders for galaxies of a given 
halo occupation. Solid lines correspond to (halo occupation) = (number of companions) for comparison. Shaded regions indicate the 1-<t scatter in the value for 
each bin. 
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FIG. 6. — Top left: Two-point correlation function for the artificial catalog (green "x"s) and the Z05 V m model (magenta triangles) with the best fit line from 
SDSS. EiTor bars represent jackknife plus Poisson errors. Bottom left: the percent difference between the artificial catalog and the V m model. Right: Histogram 
of the fraction of galaxies or halos with a given number of companions within an R c = 3 tr'Mpc cylinder for the test catalog (green long-dash-short-dash line), 
the Z05 V m data (magenta dashed line) and SDSS (black solid line). 
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FIG. 7. — Bottom left: Histogram of the fraction of galaxies or halos with a given number of companions within the R c = 1 IT Mpc cylinder for Z05 Vh, 
(magenta long-dashed line), V n0 w (blue dot-dashed line), Millennium (red short-dashed line), MPAGalaxies (cyan short-dashed-long dashed line), and SDSS 
(smooth solid black line). The arrows show the average number of companions for each distribution. Top left: Cylinders counts with the SDSS distribution 
subtracted. The shaded area is l-cr from cosmic variance between SDSS-sized volumes, while the en'or bars include the cosmic variance between Z05-sized 
volumes. Right: Cumulative fraction of of galaxies or halos with fewer than the given number of companions within the R c = 1 IT 1 Mpc cylinders. 
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FIG. 10. — Left: Cylinder counts for SDSS (smooth solid black line), Z05 V m (magenta long-dashed line), and Z05 V m with basic color correction (green 
short-dashed line) within 6 tr'Mpc cylinders. Right: Cylinder counts for SDSS (smooth solid black line), Z05 V m (magenta long-dashed line), Z05 V now (blue 
dot-dashed line), a combination of V[ n and V now (red short-dashed line), and V m with scatter in the V max cut ((cyan long-dashed-short-dashed line) within R c = 6 
tr'Mpc cylinders. 



